Welcome to my Manhattan Mobility Study! We are going to take dive into the Big Apple and try to get a better understanding of two of its forms of transportation: a Bike Sharing System and Uber Rides.
How are each of them used throughout the day? Is there a big difference in their use between weekends and week days? Which are the favorite districts for cyclists? How is the dynamic of a neighborhood very popular among tourists? What about local workers?
We are going to try to answer some of these questions using python, with a little help from folium, a very powerful library that will help us make some beautiful maps! You can see the full project in my github.
## manipulation
import pandas as pd
import numpy as np
import datetime as dt
import geopy.distance
## dataviz
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import matplotlib.dates as mdates
from matplotlib.colors import ListedColormap
import branca
import branca.colormap as cm
## maps
import folium
from folium.plugins import HeatMap
import geopandas as gpd
## misc
import imageio
import os
import time
from selenium import webdriver
from itertools import product
## Colors
c_gray = '#414647'
c_darkblue = '#113774'
c_darkblue2 = '#0e4f66'
### Functions
def classify_time_group(time):
""" Receives a integer representing the hour, and classify it in 'morning', 'afternoon' or night."""
if time <= 5:
return 'night'
elif time <= 11:
return 'morning'
elif time <= 18:
return 'afternoon'
else:
return 'night'
def generate_scale(values, n):
""" Receives an array with floats, and returns a list with the thresholds of n intervals. """
maxv = values.max()
minv = values.min()
n_range = maxv - minv
myscale = [(minv + (n_range/(n-1))*i) for i in range(n)]
return myscale
def generate_geojson(df):
""" Receives a dataframe, and returns a geodataframe, with some
new spatial information collected from the nyc_neighborhoods_map, such as the district of the trip"""
## create a geodataframe from the dataframe passed
geodf_manhattan = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df['start_longitude'], df['start_latitude']), crs = nyc_neighborhoods_map.crs).reset_index(drop = True)
## join this geodataframe with the nyc_neighborhoods_map, that contains the district limits of NYC
new_geodf = gpd.sjoin(geodf_manhattan, nyc_neighborhoods_map, predicate = 'within')
return new_geodf
def generate_choropleth_map(df, value_to_plot, legend = '', alias_label = '',div = 7, title = None):
""" Receives a dataframe, with information aggregated by the district, and returns a Choropleth Map"""
# Create a geodataframe, joining the df received, and the nyc map
geodf_final = pd.merge(nyc_neighborhoods_map, df, on = 'ntaname')
# Create the map in the background
map_to_plot = folium.Map(location=[y_map, x_map], zoom_start=12,tiles=None)
folium.TileLayer('CartoDB positron',name="Light Map",control=False).add_to(map_to_plot)
# set some parameters of the choropleth map
style_function = lambda x: {'fillColor': '#ffffff',
'color':'#000000',
'fillOpacity': 0.1,
'weight': 0.1}
highlight_function = lambda x: {'fillColor': '#000000',
'color':'#000000',
'fillOpacity': 0.50,
'weight': 0.1}
# create the color scale thresholds
myscale = generate_scale(geodf_final[value_to_plot],div)
# generate choropleth map
folium.Choropleth(
geo_data=geodf_final,
name='Choropleth',
data=geodf_final,
columns=['ntaname',value_to_plot],
key_on="feature.properties.ntaname",
fill_color='YlGnBu',
threshold_scale=myscale,
fill_opacity=1,
line_opacity=0.5,
legend_name=legend,
smooth_factor=0).add_to(map_to_plot)
# create the title of the map (optional)
if title != None:
title_html = '''
<h3 align="center" style="font-size:19px"><b>{}</b></h3>
'''.format(title)
map_to_plot.get_root().html.add_child(folium.Element(title_html))
# create the interactive legend of the map
if alias_label != '':
NIL = folium.features.GeoJson(
geodf_final,
style_function=style_function,
control=False,
highlight_function=highlight_function,
tooltip=folium.features.GeoJsonTooltip(
fields=['ntaname', value_to_plot],
aliases=['Neighborhood:', str(alias_label)+':'],
style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;")
)
)
else:
NIL = folium.features.GeoJson(
geodf_final,
style_function=style_function,
control=False,
highlight_function=highlight_function,
tooltip=folium.features.GeoJsonTooltip(
fields=['ntaname'],
aliases=['Neighborhood:'],
style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;")
)
)
map_to_plot.add_child(NIL)
map_to_plot.keep_in_front(NIL)
folium.LayerControl().add_to(map_to_plot)
return map_to_plot
We are going to collect data from 3 different subjects: bike trips, uber rides, and the shapefile with the neighboorhods limits of Manhattan.
The bike trips data were collected from the website of citibike, the most important bike sharing system in New York. We'll analyze over 1.4 million trips, and we'll have a lot of information, such as duration, departure and arrival station, which plan the user had, his age, etc.
We'll be able to explore over 700.000 Uber trips, thanks to the FiveThirtyEight portal, which has some very interesting datasets and studies. This data was obtained from the NYC Taxi & Limousine Commission (TLC) through a request, supported by the Freedom of Information Law.
The NYC Open Data website contains a lot of useful information about the city, that are provided and maintained by agencies and the city office. We can find data about education, business, environment, city landmarks, health, you name it… it is even possible to find the census data of squirrels in Central Park.
In this website, we were able to download the shapefile of the neighborhood limits, that will be very helpful in our spatial analysis.
We'll start with the bike trips
## import data from the citibike
bike_df = pd.read_csv('201809-citibike-tripdata.csv')
## create useful features to facilitate our analysis
bike_df = bike_df.reset_index().rename({'index':'trip_id'}, axis = 1).drop(['bikeid', 'tripduration', 'birth year', 'gender'], axis = 1)
bike_df.columns = ['trip_id', 'start_time', 'stop_time', 'start_station_id',
'start_station_name', 'start_latitude',
'start_longitude', 'end_station_id', 'end_station_name',
'end_latitude', 'end_longitude', 'user_type']
bike_df['start_time'] = pd.to_datetime(bike_df['start_time'])
bike_df['trip_date'] = pd.to_datetime(bike_df['start_time'].astype(str).str[:10])
bike_df['dow_trip'] = bike_df['trip_date'].dt.day_name()
bike_df['is_weekend'] = np.where(bike_df['dow_trip'].isin(['Sunday', 'Saturday']),True,False)
bike_df['trip_hour'] = pd.to_datetime(bike_df['start_time']).dt.hour
bike_df['time_of_day'] = bike_df['trip_hour'].apply(classify_time_group)
display(bike_df.head())
print("Number of bike trips from the original dataset:", bike_df.shape[0])
| trip_id | start_time | stop_time | start_station_id | start_station_name | start_latitude | start_longitude | end_station_id | end_station_name | end_latitude | end_longitude | user_type | trip_date | dow_trip | is_weekend | trip_hour | time_of_day | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2018-09-01 00:00:05.269 | 2018-09-01 00:27:20.6340 | 252.0 | MacDougal St & Washington Sq | 40.732264 | -73.998522 | 366.0 | Clinton Ave & Myrtle Ave | 40.693261 | -73.968896 | Subscriber | 2018-09-01 | Saturday | True | 0 | night |
| 1 | 1 | 2018-09-01 00:00:11.281 | 2018-09-01 00:02:23.4810 | 314.0 | Cadman Plaza West & Montague St | 40.693830 | -73.990539 | 3242.0 | Schermerhorn St & Court St | 40.691029 | -73.991834 | Subscriber | 2018-09-01 | Saturday | True | 0 | night |
| 2 | 2 | 2018-09-01 00:00:20.649 | 2018-09-01 00:55:58.5470 | 3142.0 | 1 Ave & E 62 St | 40.761227 | -73.960940 | 3384.0 | Smith St & 3 St | 40.678724 | -73.995991 | Subscriber | 2018-09-01 | Saturday | True | 0 | night |
| 3 | 3 | 2018-09-01 00:00:21.746 | 2018-09-01 00:07:38.5830 | 308.0 | St James Pl & Oliver St | 40.713079 | -73.998512 | 3690.0 | Park Pl & Church St | 40.713342 | -74.009355 | Subscriber | 2018-09-01 | Saturday | True | 0 | night |
| 4 | 4 | 2018-09-01 00:00:27.315 | 2018-09-01 02:21:25.3080 | 345.0 | W 13 St & 6 Ave | 40.736494 | -73.997044 | 380.0 | W 4 St & 7 Ave S | 40.734011 | -74.002939 | Customer | 2018-09-01 | Saturday | True | 0 | night |
Number of bike trips from the original dataset: 1877884
## Checking null values
display(bike_df.isnull().sum())
trip_id 0 start_time 0 stop_time 0 start_station_id 716 start_station_name 716 start_latitude 0 start_longitude 0 end_station_id 716 end_station_name 716 end_latitude 0 end_longitude 0 user_type 0 trip_date 0 dow_trip 0 is_weekend 0 trip_hour 0 time_of_day 0 dtype: int64
We have 716 rows without the origin/destin references. This is a very small number, let's just remove them.
## Removing null values
bike_df = bike_df.dropna()
Now, let's read the data from uber trips.
## import data from the citibike
uber_df = pd.read_csv('uber-raw-data-sep14.csv')
## create useful features to facilitate our analysis
uber_df = uber_df.reset_index().rename({'index':'trip_id'}, axis = 1)
uber_df.columns = ['trip_id', 'start_time', 'start_latitude', 'start_longitude', 'base']
uber_df['start_time'] = pd.to_datetime(uber_df['start_time'])
uber_df['trip_date'] = pd.to_datetime(uber_df['start_time'].astype(str).str[:10])
uber_df['dow_trip'] = uber_df['trip_date'].dt.day_name()
uber_df['is_weekend'] = np.where(uber_df['dow_trip'].isin(['Sunday', 'Saturday']),True,False)
uber_df['trip_hour'] = pd.to_datetime(uber_df['start_time']).dt.hour
uber_df['time_of_day'] = uber_df['trip_hour'].apply(classify_time_group)
display(uber_df.head())
print("Number of uber trips from the original dataset:", uber_df.shape[0])
| trip_id | start_time | start_latitude | start_longitude | base | trip_date | dow_trip | is_weekend | trip_hour | time_of_day | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2014-09-01 00:01:00 | 40.2201 | -74.0021 | B02512 | 2014-09-01 | Monday | False | 0 | night |
| 1 | 1 | 2014-09-01 00:01:00 | 40.7500 | -74.0027 | B02512 | 2014-09-01 | Monday | False | 0 | night |
| 2 | 2 | 2014-09-01 00:03:00 | 40.7559 | -73.9864 | B02512 | 2014-09-01 | Monday | False | 0 | night |
| 3 | 3 | 2014-09-01 00:06:00 | 40.7450 | -73.9889 | B02512 | 2014-09-01 | Monday | False | 0 | night |
| 4 | 4 | 2014-09-01 00:11:00 | 40.8145 | -73.9444 | B02512 | 2014-09-01 | Monday | False | 0 | night |
Number of uber trips from the original dataset: 1028136
## Checking null values
display(uber_df.isnull().sum())
trip_id 0 start_time 0 start_latitude 0 start_longitude 0 base 0 trip_date 0 dow_trip 0 is_weekend 0 trip_hour 0 time_of_day 0 dtype: int64
That's great, we don't have null values from uber trips!
Before moving on, there's one more thing we need to do:
We need to filter trips that started in Manhattan. With our current datasets it would be impossible to do that, since we only have the latitude/longitude references.
Luckily, we have a shapefile that contains the limits of the city and its neighborhoods, and not only it will allow us to filter all the trips started in Manhattan, but will also help us a lot in our geospatial analysis session.
## Reading the shapefile as a geodataframe
nyc_neighborhoods_map = gpd.read_file('geo_export_cf318e70-82ae-47ad-aef2-71ebd6c88bf4.shp')
## Filtering Manhattan in the geodataframe
nyc_neighborhoods_map = nyc_neighborhoods_map[nyc_neighborhoods_map['boro_name'] == 'Manhattan']
## Creating a geojson for each dataset, that contains all the trips, and geospatial data obtained from the shapefile
bike_geojson = generate_geojson(bike_df)
uber_geojson = generate_geojson(uber_df)
## Now, back to our original df, we'll filter only the trips that started in Manhattan
bike_df = bike_df[bike_df['trip_id'].isin(bike_geojson['trip_id'])]
uber_df = uber_df[uber_df['trip_id'].isin(uber_geojson['trip_id'])]
print("Number of uber trips from the new dataset:", uber_df.shape[0])
print("Number of bike trips from the new dataset:", bike_df.shape[0])
Number of uber trips from the new dataset: 760171 Number of bike trips from the new dataset: 1467840
Before we start, let me to give an important note: The bike and uber trips are both from September, but from different years. That fact should always be taken into account when observing some proposed comparisons, and I invite you to always look at them with a critical sense.
For example: When analyzing the numbers presented, it would be inappropriate to conclude that the number of trips by bike is 2x greater than by uber, since the absolute number of trips of the two modes certainly varied a lot in 4 years.
However, it seems fair to use the premise that the dynamics of the city and some user behaviors have remained: the time that people go to work must not have changed abruptly. Popular districts among uber users will be the same in 2014 and 2018, etc. Using that premisse, we can make a lot of interesting analysis.
Let’s see the number of daily trips over time. Since we are analyzing different periods, the idea here is not to compare the absolute numbers, but the overall behaviour: are the number of trips somewhat stable, or do we have very high/low peaks? Do we have a growing pattern in our data? What about a weekly seasonality?
sns.set_style('darkgrid')
## creating auxiliary dataframes
bike_trips_per_day = bike_df.groupby(['trip_date']).size().reset_index().rename({0:'trips'}, axis = 1)
bike_trips_per_day['trip_date'] = pd.to_datetime(bike_trips_per_day['trip_date'])
uber_trips_per_day = uber_df.groupby(['trip_date']).size().reset_index().rename({0:'trips'}, axis = 1)
uber_trips_per_day['trip_date'] = pd.to_datetime(uber_trips_per_day['trip_date'])
#Graf 1
plt.figure(figsize=(12,6))
plt.ylim(0,80000)
plt.title("BIKE TRIPS", fontweight = 'bold', fontsize = 15, color = c_gray)
ax = sns.lineplot(data = bike_trips_per_day, x = 'trip_date', y = 'trips', linewidth = 2.2, label = 'bike_trips',
color = 'darkred')
plt.ylabel('TRIPS', fontsize = 12, fontweight = 'bold')
plt.xlabel('\n DATE', fontsize = 12, fontweight = 'bold')
ax.xaxis.set_major_locator(mdates.DayLocator(interval=7))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
ax.tick_params(axis='y', labelsize=12)
ax.tick_params(axis='x', labelsize=12)
plt.legend(prop={'size': 12})
plt.show()
#Graf 2
print('\n')
plt.figure(figsize=(12,6))
plt.ylim(0,45000)
plt.title("UBER TRIPS", fontweight = 'bold', fontsize = 15, color = c_gray)
ax = sns.lineplot(data = uber_trips_per_day, x = 'trip_date', y = 'trips', linewidth = 2.2, label = 'uber_trips',
color = 'tab:blue')
plt.ylabel('TRIPS', fontsize = 12, fontweight = 'bold')
plt.xlabel('\n DATE', fontsize = 12, fontweight = 'bold')
ax.xaxis.set_major_locator(mdates.DayLocator(interval=7))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
ax.tick_params(axis='y', labelsize=12)
ax.tick_params(axis='x', labelsize=12)
plt.legend(prop={'size': 12})
plt.show()
Looking at both curves, we can’t see a growing pattern. Also, as we might expect, the number of bike trips have some very low peaks, while the uber trips are more stable. That makes sense, since bike trips can be very sensible to bad weather.
Furthermore, the uber rides seem to have a strong seasonality, that is harder to spot in the bike data. Let’s dig a little deeper.
sns.set_style('darkgrid')
#Creating auxiliar datasets to help the plotting
bike_trips_per_dow = bike_df.groupby(['dow_trip', 'trip_date','is_weekend']).size().reset_index().rename({0:'trips'}, axis = 1)
uber_trips_per_dow = uber_df.groupby(['dow_trip', 'trip_date', 'is_weekend']).size().reset_index().rename({0:'trips'}, axis = 1)
""" We'll create 2 extra columns to make the plotting better:
One of them is to make sure the day of weeks are in the correct order, and the other is the name abbreviated"""
dict_dow = ({'Monday':0, 'Tuesday':1, 'Wednesday':2, 'Thursday': 3, 'Friday':4, 'Saturday':5, 'Sunday':6})
dict_dow_name = ({'Monday':'Mon', 'Tuesday':'Tue', 'Wednesday':'Wed', 'Thursday': 'Thu', 'Friday':'Fri', 'Saturday':'Sat', 'Sunday':'Sun'})
bike_trips_per_dow['dow_aux'] = bike_trips_per_dow['dow_trip'].map(dict_dow)
bike_trips_per_dow['dow_mini_name'] = bike_trips_per_dow['dow_trip'].map(dict_dow_name)
uber_trips_per_dow['dow_aux'] = uber_trips_per_dow['dow_trip'].map(dict_dow)
uber_trips_per_dow['dow_mini_name'] = uber_trips_per_dow['dow_trip'].map(dict_dow_name)
### Plotting the charts
#Chart 1
plt.figure(figsize=(10,6))
plt.title("BIKE TRIPS", fontweight = 'bold', fontsize = 15, color = c_gray)
ax = sns.barplot(data = bike_trips_per_dow.sort_values(by = 'dow_aux'), x = 'dow_mini_name', y = 'trips',
linewidth = 2.2, color = '#e06666')
plt.ylabel('TRIPS', fontsize = 12, fontweight = 'bold')
plt.xlabel('\n DAY OF WEEK', fontsize = 12, fontweight = 'bold')
ax.tick_params(axis='y', labelsize=12)
ax.tick_params(axis='x', labelsize=12)
print('\n')
#Chart 2
plt.figure(figsize=(10,6))
plt.title("UBER TRIPS", fontweight = 'bold', fontsize = 15, color = c_gray)
ax = sns.barplot(data = uber_trips_per_dow.sort_values(by = 'dow_aux'), x = 'dow_mini_name', y = 'trips',
linewidth = 2.2, color = 'tab:blue')
plt.ylabel('TRIPS', fontsize = 12, fontweight = 'bold')
plt.xlabel('\n DAY OF WEEK', fontsize = 12, fontweight = 'bold')
ax.tick_params(axis='y', labelsize=12)
ax.tick_params(axis='x', labelsize=12)
plt.show()
sns.set_style('darkgrid')
#Creating auxiliar datasets to help the plotting
bike_trips_per_day_type = bike_df.groupby(['trip_date', 'is_weekend']).size().reset_index().rename({0:'trips'}, axis = 1)
bike_trips_per_day_type['day_type'] = np.where(bike_trips_per_day_type['is_weekend']==False, 'weekday', 'weekend')
bike_trips_per_day_type.sort_values(['day_type', 'trip_date'], inplace = True)
uber_trips_per_day_type = uber_df.groupby(['trip_date', 'is_weekend']).size().reset_index().rename({0:'trips'}, axis = 1)
uber_trips_per_day_type['day_type'] = np.where(uber_trips_per_day_type['is_weekend']==False, 'weekday', 'weekend')
uber_trips_per_day_type.sort_values(['day_type', 'trip_date'], inplace = True)
#Chart 1
plt.figure(figsize=(12,6))
plt.title("BIKE TRIPS", fontweight = 'bold', fontsize = 15, color = c_gray)
ax = sns.barplot(data = bike_trips_per_day_type, x = 'day_type', y = 'trips',
palette = 'afmhot')
plt.ylabel('TRIPS', fontsize = 13, fontweight = 'bold')
plt.xlabel('\n DAY TYPE', fontsize = 13, fontweight = 'bold')
ax.tick_params(axis='y', labelsize=13)
ax.tick_params(axis='x', labelsize=13)
plt.show()
print('\n')
#Chart 2
plt.figure(figsize=(12,6))
plt.title("uber TRIPS", fontweight = 'bold', fontsize = 15, color = c_gray)
ax = sns.barplot(data = uber_trips_per_day_type, x = 'day_type', y = 'trips',
palette = 'winter')
plt.ylabel('TRIPS', fontsize = 13, fontweight = 'bold')
plt.xlabel('\n DAY TYPE', fontsize = 13, fontweight = 'bold')
ax.tick_params(axis='y', labelsize=13)
ax.tick_params(axis='x', labelsize=13)
plt.show()
The seasonality is very clear in the uber rides, with a crescent number of trips throughout the week and a small variation. There’s also a certain pattern in the bike trips, but since the variation is bigger, this pattern is harder to spot.
In both modes, Thursday and Friday have a large volume of trips, while Sunday and Monday have the lowest.
The number of trips drops on the weekend. That behaviour is a little surprising to me for the bike trips, as I would expect that on the weekends there would be a lot of trips related to leisure / exercising.
Let’s take a look at how these trips are divided throughout the day, to get a better understanding on why they happen.
Now we are going to see when the trips happen through the day, breaking down by mode and day type. That might help us to understand the dynamic of the city and how new yorkers rely on each system: when people go to work, when they go home, what mode they use in these situations, whether they make a lot of trips outside the peak hours, etc.
#Creating auxiliar datasets to help the plotting
bike_trips_per_hour = bike_df.groupby(['trip_hour', 'trip_date', 'is_weekend']).size().reset_index().rename({0:'trips'}, axis = 1)
bike_trips_per_hour['day_type'] = np.where(bike_trips_per_hour['is_weekend']==False, 'weekday', 'weekend')
bike_trips_per_hour.sort_values(['day_type', 'trip_date', 'trip_hour'], inplace = True)
uber_trips_per_hour = uber_df.groupby(['trip_hour', 'trip_date', 'is_weekend']).size().reset_index().rename({0:'trips'}, axis = 1)
uber_trips_per_hour['day_type'] = np.where(uber_trips_per_hour['is_weekend']==False, 'weekday', 'weekend')
uber_trips_per_hour.sort_values(['day_type', 'trip_date', 'trip_hour'], inplace = True)
sns.set_style('darkgrid')
#Chart 1
plt.figure(figsize=(12,6))
plt.title("BIKE TRIPS", fontweight = 'bold', fontsize = 15, color = c_gray)
ax = sns.lineplot(data = bike_trips_per_hour, x = 'trip_hour', y = 'trips',
linewidth = 2.2, hue = 'day_type', palette = 'afmhot')# {'weekend':'#e69138', 'weekday':'#660000'}) #winter
plt.ylabel('TRIPS', fontsize = 12, fontweight = 'bold')
plt.xlabel('\n HOUR', fontsize = 12, fontweight = 'bold')
ax.tick_params(axis='y', labelsize=12)
ax.tick_params(axis='x', labelsize=12)
plt.show()
print('\n')
#Chart 2
plt.figure(figsize=(12,6))
plt.title("UBER TRIPS", fontweight = 'bold', fontsize = 15, color = c_gray)
ax = sns.lineplot(data = uber_trips_per_hour, x = 'trip_hour', y = 'trips',
linewidth = 2.2, hue = 'day_type', palette = 'winter')#{'weekend':'#073763', 'weekday':'#a2c4c9'})
plt.ylabel('TRIPS', fontsize = 12, fontweight = 'bold')
plt.xlabel('\n HOUR', fontsize = 12, fontweight = 'bold')
ax.tick_params(axis='y', labelsize=12)
ax.tick_params(axis='x', labelsize=12)
plt.show()
Okay, now that we have a basic understanding of how New Yorkers use each mode, let's delve into the subject through a set of geospatial analyzes.
Now we'll try to find patterns by looking at the districts in the city. What are the most popular neighborhoods for cyclists? These neighborhoods are also pouplar among uber users? How is the dynamics of neighborhoods with many tourist attractions? Popular districts on weekends are the same compared to weekdays? Which neighborhoods have the busiest nightlife?
We are going to generate some maps with a little help from the folium library.
## setting the center of the map
y_map = 40.7612
x_map = -73.9757
center = (y_map,x_map)
First of all, let's take a look where the citibike stations are placed. To do that, we'll plot a bubble map, where each bubble will indicate the location of a station, and the size will represent the number of trips initiated in each station.
### create a dataset with the average of daily trips per station
df_bike_trips_per_station_1 = bike_geojson.groupby(['start_station_name', 'start_latitude',
'start_longitude', 'trip_date']).size().to_frame().reset_index().rename({0:'trips'}, axis = 1)
df_bike_trips_per_station_2 = df_bike_trips_per_station_1.groupby(['start_station_name', 'start_latitude',
'start_longitude']).mean()['trips'].to_frame().reset_index()
### Creating the map
## generating the background
bubble_map_daily_trips = folium.Map(center,zoom_start=12)
## generating the bubbles
for index, row in df_bike_trips_per_station_2.iterrows():
folium.Circle(location = [row['start_latitude'], row['start_longitude']],
radius = row['trips']/4,
fill = True,
color = 'blue',
tooltip = row['start_station_name'] + ' (' + str(int(round(row['trips'],0))) + ' trips per day)',
fill_opacity = 0.5).add_to(bubble_map_daily_trips)
bubble_map_daily_trips
We can see that:
From the previous map, we could get a good idea on how the stations are distributed, and where are the most popular regions among cyclists. For uber rides, we can't use the same kind of map, since the trips can start from basically anywhere. So, we need to use another kind of representation to analyze uber trips. We'll use a heatmap!
## creating a numpy array with the latitude and longitude
uber_np = uber_geojson[['start_latitude','start_longitude']].to_numpy()
#Generating the heatmap
uber_heatmap = folium.Map(center,zoom_start=12)
folium.plugins.HeatMap(uber_np,radius=12).add_to(uber_heatmap)
folium.LayerControl().add_to(uber_heatmap)
uber_heatmap